AITopics | gradient descent and momentum

Collaborating Authors

gradient descent and momentum

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

On discretisation drift and smoothness regularisation in neural network training

Rosca, Mihaela Claudia

arXiv.org Machine LearningOct-21-2023

The deep learning recipe of casting real-world problems as mathematical optimisation and tackling the optimisation by training deep neural networks using gradient-based optimisation has undoubtedly proven to be a fruitful one. The understanding behind why deep learning works, however, has lagged behind its practical significance. We aim to make steps towards an improved understanding of deep learning with a focus on optimisation and model regularisation. We start by investigating gradient descent (GD), a discrete-time algorithm at the basis of most popular deep learning optimisation algorithms. Understanding the dynamics of GD has been hindered by the presence of discretisation drift, the numerical integration error between GD and its often studied continuous-time counterpart, the negative gradient flow (NGF). To add to the toolkit available to study GD, we derive novel continuous-time flows that account for discretisation drift. Unlike the NGF, these new flows can be used to describe learning rate specific behaviours of GD, such as training instabilities observed in supervised learning and two-player games. We then translate insights from continuous time into mitigation strategies for unstable GD dynamics, by constructing novel learning rate schedules and regularisers that do not require additional hyperparameters. Like optimisation, smoothness regularisation is another pillar of deep learning's success with wide use in supervised learning and generative modelling. Despite their individual significance, the interactions between smoothness regularisation and optimisation have yet to be explored. We find that smoothness regularisation affects optimisation across multiple deep learning domains, and that incorporating smoothness regularisation in reinforcement learning leads to a performance boost that can be recovered using adaptions to optimisation methods.

artificial intelligence, discretisation drift help test performance, machine learning, (19 more...)

arXiv.org Machine Learning

2310.14036

Country:

North America > United States (0.45)
Asia (0.14)
Europe (0.14)
North America > Canada (0.13)

Genre:

Research Report > New Finding (1.00)
Overview (0.92)
Instructional Material (0.92)

Industry:

Education (1.00)
Health & Medicine (0.67)
Leisure & Entertainment > Games > Computer Games (0.67)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.67)

Add feedback

Gentle Introduction to Gradient Descent and Momentum

#artificialintelligenceJul-18-2021, 09:15:52 GMT

In this article, we will talk about a fundamental concept in machine learning called the Gradient Descent. The gradient descent is one of the most popular algorithms that tends to reduce the error in prediction i.e minimizing your cost function. This might have been confusing but that's okay, before we jump into more details I'll give a very small gist of where it is mostly used. In deep learning, we have a concept called backpropagation. Wikipedia says " backpropagation computes the gradient of the loss function with respect to the weights of the network for a single input–output example, and does so efficiently, unlike a naïve direct computation of the gradient with respect to each weight individually" I had a brain-freeze when I read this, so let me give you an intuitive example to help you understand better.

gradient descent, gradient descent and momentum, minima, (3 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.93)

Add feedback